The following packages are used:
library(dplyr)
library(ggplot2)
library(jsonlite)
library(knitr)
library(readr)
library(stringr)
library(tidyr)
library(zoo)
Hearthstone is a popular collectible card game published by Blizzard Entertainment in 2014, which is based on the Warcraft series by the same company. The goal of the game is to build a deck of 30 cards and win by defeating the opponent’s or getting the opponent to concede first.
Cards can be classified according to the following categories:
There are three datasets:
data.csv contains a list of decks submitted by players to HearthPwn from 2013 (pre-launch) to 2017,refs.json contains detailed information about all cards (both collectible and non-collectible) up to March 2017.
cards_collectible.json contains detailed information about all the cards that are collectible in the game.
Which are the most popular cards used in Ranked decks?
We focus on the Ranked format where players get to decide which cards to include in their deck, therefore the cards’ popularity are more accurately represented, and the gameplay is not subject to additional constraints that other game modes (like Tavern Brawls and Adventures) may impose.
How do we determine popularity?
We determine popularity by the number of decks that include at least 1 copy in the starting 30 cards (not generated by other effects).
A deck can include at most 2 copies of any card (1 for Legendary cards), thus a card’s popularity is not heavily influenced by the number of copies players wish to use.
Possible biases to consider
Since Neutral cards can be used by multiple classes, they should be more popular than Class-specific cards.
For the Wild format, cards from the older sets may be more popular simply because they have been in the game longer.
For the Standard format, cards from the Basic and Classic sets will be more popular because they do not rotate out of the format unlike expansion cards.
If a certain card becomes too popular (i.e. the community thinks players must include it in their decks), it reduces the card variety in the metagame and makes gameplay frustrating for other players (amongst other consequences). In the long term, this may lead to player attrition and loss of potential revenue (when players purchase card packs or other cosmetics).
Historically, Blizzard has dealt with problematic cards in one of several ways:
data_path <- file.path("..", "data")
decks_raw <- read_csv(file.path(data_path, "data.csv"))
cards_raw <- fromJSON(file.path(data_path, "cards_collectible.json"), flatten = TRUE)
The decks_raw data has 346232 rows and 41 columns. The first 11 columns describe the deck’s attributes (like date submitted, class, deck format) while the remaining 30 columns describe the cards each deck contains (based on the card’s unique ID which can be referenced from the cards_raw data).
Detailed information on the variables can be found on the Kaggle dataset.
names(decks_raw)
## [1] "craft_cost" "date" "deck_archetype" "deck_class"
## [5] "deck_format" "deck_id" "deck_set" "deck_type"
## [9] "rating" "title" "user" "card_0"
## [13] "card_1" "card_2" "card_3" "card_4"
## [17] "card_5" "card_6" "card_7" "card_8"
## [21] "card_9" "card_10" "card_11" "card_12"
## [25] "card_13" "card_14" "card_15" "card_16"
## [29] "card_17" "card_18" "card_19" "card_20"
## [33] "card_21" "card_22" "card_23" "card_24"
## [37] "card_25" "card_26" "card_27" "card_28"
## [41] "card_29"
# code is not run
glimpse(decks_raw)
There are 8 rows that contain missing data. The missing values are in the 10th column, which contains the decks’ title as submitted by the users, so they can be safely ignored.
sum(!complete.cases(decks_raw))
## [1] 8
which(is.na(decks_raw), arr.ind = TRUE)
## row col
## [1,] 16747 10
## [2,] 175608 10
## [3,] 216047 10
## [4,] 238021 10
## [5,] 278491 10
## [6,] 326192 10
## [7,] 329285 10
## [8,] 329286 10
Additional information on the variables can be found on HearthstoneJSON.
There are two identifier fields for the cards: a character/string id and an integer dbfId. The decks_raw dataset uses the integer IDs to reference cards used.
dim(cards_raw)
## [1] 1751 65
# code is not run
names(cards_raw)
# code is not run
glimpse(cards_raw)
The following computes the number of missing values in each field, with the exception of those that are present as lists or data frames (mechanics, referencedTags, classes, entourage)
# code is not run
sapply(cards_raw, function(x) sum(is.na(x)))
The raw dataset is split into two, one containing the deck attributes and the other containing the deck composition (cards), with deck_id acting as the unique identifier. We also exclude decks created before launch (there were many card changes in the alpha and beta stages, making card popularity very volatile).
The decks_comp data will be pivoted to long format later on, thus excluding fields that are not related to the cards will minimize the size of the dataset.
launch_date <- as.Date("2014-03-11")
decks_attr <- decks_raw %>%
filter(date >= launch_date) %>%
select(deck_id, craft_cost:deck_format, deck_set:user)
decks_comp <- decks_raw %>%
filter(date >= launch_date) %>%
select(deck_id, card_0:card_29)
Within decks_attr, the factor/enumerated columns are identified and recast accordingly.
fct_cols_attr <- c("deck_archetype", "deck_class", "deck_format", "deck_set", "deck_type")
decks_attr[fct_cols_attr] <- lapply(decks_attr[fct_cols_attr], factor)
rm(fct_cols_attr)
While each deck has a submission date, we may also be interested in grouping the decks by month (which corresponds to Ranked seasons) and by year (which is marked by expansion release dates instead of calendar dates).
Years in the game based on a time period that:
So based on the release dates, the years would be:
# month (with year)
decks_attr$hsmonth <- as.yearmon(decks_attr$date)
# year (not by calendar)
decks_attr$hsyear <- case_when(
decks_attr$date <= as.Date("2015-04-01") ~ "2014",
decks_attr$date >= as.Date("2015-04-02") &
decks_attr$date <= as.Date("2016-04-25") ~ "2015",
decks_attr$date >= as.Date("2016-04-26") ~ "2016"
) %>%
factor()
The Standard and Wild formats were formally introduced into the game on 2016-04-26 with the release of Whispers of the Old Gods; however the graphic below shows that many decks from June 2014 to April 2016 were marked as Wild.
We can simply relabel all decks created before 2016-04-26 as Standard since all cards before then are not separated by format:
decks_attr$deck_format[decks_attr$date < as.Date("2016-04-26")] <- "S"
A summary of the processed data is shown below:
## deck_id craft_cost date
## Min. : 36923 Min. : 0 Min. :2014-03-11
## 1st Qu.:253573 1st Qu.: 2840 1st Qu.:2015-05-26
## Median :428597 Median : 5120 Median :2016-02-09
## Mean :419989 Mean : 5745 Mean :2015-12-21
## 3rd Qu.:603508 3rd Qu.: 7840 3rd Qu.:2016-08-09
## Max. :749548 Max. :48000 Max. :2017-03-19
##
## deck_archetype deck_class deck_format
## Unknown :220501 Mage :42230 S:307743
## Midrange Shaman: 5472 Priest :41756 W: 16361
## Control Priest : 5135 Paladin:39368
## Control Warrior: 4939 Warlock:35598
## Tempo Mage : 4545 Druid :35488
## Midrange Hunter: 4371 Shaman :33969
## (Other) : 79141 (Other):95695
## deck_set deck_type rating
## Explorers : 57307 Arena : 8178 Min. : 0.000
## Old Gods : 49895 None : 75120 1st Qu.: 1.000
## Blackrock Launch: 38900 PvE Adventure: 9059 Median : 1.000
## Gadgetzan : 31329 Ranked Deck :202104 Mean : 2.777
## Naxx Launch : 22283 Tavern Brawl : 6360 3rd Qu.: 1.000
## Yogg Nerf : 22175 Theorycraft : 19686 Max. :4016.000
## (Other) :102215 Tournament : 3597
## title user hsmonth hsyear
## Length:324104 Length:324104 Min. :2014 2014: 65119
## Class :character Class :character 1st Qu.:2015 2015:128062
## Mode :character Mode :character Median :2016 2016:130923
## Mean :2016
## 3rd Qu.:2017
## Max. :2017
##
Many columns in the cards_raw data pertain to the card’s stats, mechanics and play requirements, which is better explained by the text on the card image. So we choose to only include the columns that we consider are core properties of the card (and not sufficiently explained by the card text):
cards_simple <- select(cards_raw,
dbfId, name, cost, cardClass,
rarity, type, set, collectible, id)
## Observations: 1,751
## Variables: 9
## $ dbfId <int> 2539, 2541, 2545, 2572, 2542, 2549, 2571, 2544, 25...
## $ name <chr> "Flame Lance", "Effigy", "Fallen Hero", "Arcane Bl...
## $ cost <int> 5, 3, 2, 1, 3, 4, 3, 6, 8, 5, 4, 4, 1, 3, 2, 2, 4,...
## $ cardClass <chr> "MAGE", "MAGE", "MAGE", "MAGE", "MAGE", "MAGE", "M...
## $ rarity <chr> "COMMON", "RARE", "RARE", "EPIC", "RARE", "COMMON"...
## $ type <chr> "SPELL", "SPELL", "MINION", "SPELL", "SPELL", "MIN...
## $ set <chr> "TGT", "TGT", "TGT", "TGT", "TGT", "TGT", "TGT", "...
## $ collectible <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TR...
## $ id <chr> "AT_001", "AT_002", "AT_003", "AT_004", "AT_005", ...
The set column contains abbreviated names or nicknames and is not necessarily informative; we create a new column that uses the actual names of the card sets:
unique(cards_simple$set)
## [1] "TGT" "BOOMSDAY" "BRM" "GANGS"
## [5] "CORE" "EXPERT1" "HOF" "NAXX"
## [9] "GILNEAS" "GVG" "HERO_SKINS" "ICECROWN"
## [13] "KARA" "LOE" "LOOTAPALOOZA" "OG"
## [17] "UNGORO"
# list for recoding card sets
cardset_lst <- list(
"CORE" = "Basic",
"EXPERT1" = "Classic",
# 2014 sets
"NAXX" = "Curse of Naxxramas",
"GVG" = "Goblins vs Gnomes",
# 2015 sets
"BRM" = "Blackrock Mountain",
"TGT" = "The Grand Tournament",
"LOE" = "League of Explorers",
# 2016 sets
"OG" = "Whispers of the Old Gods",
"KARA" = "One Night in Karazhan",
"GANGS" = "Mean Streets of Gadgetzan",
"HOF" = "Hall of Fame",
# 2017 sets
"UNGORO" = "Journey to Un'Goro",
"ICECROWN" = "Knights of the Frozen Throne",
"LOOTAPALOOZA" = "Kobolds & Catacombs",
# 2018 sets
"GILNEAS" = "The Witchwood",
"BOOMSDAY" = "The Boomsday Project"
)
cards_simple$card_set <- recode(cards_simple$set, !!!cardset_lst,
.default = "Other")
# drop the set column to avoid confusion (it remains in the cards_raw dataset)
cards_simple$set <- NULL
# remove list (no longer needed)
rm(cardset_lst)
Some columns are entirely uppercase, which we convert to title case for readability:
titlecase_cols <- c("cardClass", "rarity", "type")
cards_simple[titlecase_cols] <- lapply(cards_simple[titlecase_cols],
str_to_title)
rm(titlecase_cols)
The factor/enumerated columns are then identified and recast accordingly.
fct_cols_cards <- c("cardClass", "rarity", "type", "card_set")
cards_simple[fct_cols_cards] <- lapply(cards_simple[fct_cols_cards], factor)
rm(fct_cols_cards)
A summary of the processed data is shown below:
## dbfId name cost cardClass
## Min. : 7 Length:1751 Min. : 0.000 Neutral:657
## 1st Qu.: 1987 Class :character 1st Qu.: 2.000 Paladin:123
## Median :38957 Mode :character Median : 4.000 Hunter :122
## Mean :25375 Mean : 3.856 Mage :122
## 3rd Qu.:43163 3rd Qu.: 5.000 Warlock:122
## Max. :53187 Max. :20.000 Druid :121
## NA's :22 (Other):484
## rarity type collectible id
## Common :612 Hero : 33 Mode:logical Length:1751
## Epic :298 Minion:1192 TRUE:1751 Class :character
## Free :142 Spell : 471 Mode :character
## Legendary:253 Weapon: 55
## Rare :446
##
##
## card_set
## Classic :236
## Basic :142
## Journey to Un'Goro :135
## Knights of the Frozen Throne:135
## Kobolds & Catacombs :135
## The Boomsday Project :135
## (Other) :833
As the decks are generated by human input, and there are multiple cards with the same name, it is recommended to check for cards that have the same name but wrong ID.
Specifically, we are looking for the version of each card that is collectible (since all cards used in Ranked decks must be collectible). This step requires loading the full card data (which contains non-collectible cards).
# IDs of all cards used
cards_used <- decks_comp %>%
select(card_0:card_29) %>%
unlist(use.names = FALSE) %>% # flatten into a vector
unique() %>%
sort()
# IDs of missing cards (using dbfId)
missing_cards <- cards_used[!cards_used %in% cards_simple$dbfId]
# how many cards are not found in the collectible cards dataset?
length(missing_cards)
## [1] 15
# data for all cards
cards_all_raw <- fromJSON(file.path(data_path, "refs.json"), flatten = TRUE)
# filter for those missing cards only
mssng_cards <- cards_all_raw %>%
select(dbfId, name, cardClass, type, collectible) %>%
filter(dbfId %in% missing_cards)
To find the correct IDs, we join them by name to the cards_simple. The dbfID.x on the left would be replaced by the dbfID.y on the right:
mislabelled <- mssng_cards %>%
select(dbfId, name) %>%
# also drops uncollectible mislabelled cards (which are not in simple_cards)
inner_join(cards_simple, by = "name") %>%
arrange(name)
mislabelled
| dbfId.x | name | dbfId.y | cost | cardClass | rarity | type | collectible | id | card_set |
|---|---|---|---|---|---|---|---|---|---|
| 40341 | Cleave | 940 | 2 | Warrior | Free | Spell | TRUE | CS2_114 | Basic |
| 2177 | Dark Wispers | 2009 | 6 | Druid | Epic | Spell | TRUE | GVG_041 | Goblins vs Gnomes |
| 42146 | Doppelgangster | 40953 | 5 | Neutral | Rare | Minion | TRUE | CFM_668 | Mean Streets of Gadgetzan |
| 38319 | Druid of the Claw | 692 | 5 | Druid | Common | Minion | TRUE | EX1_165 | Classic |
| 2230 | Druid of the Fang | 2048 | 5 | Druid | Common | Minion | TRUE | GVG_080 | Goblins vs Gnomes |
| 2310 | Druid of the Flame | 2292 | 3 | Druid | Common | Minion | TRUE | BRM_010 | Blackrock Mountain |
| 40402 | Evolve | 38266 | 1 | Shaman | Rare | Spell | TRUE | OG_027 | Whispers of the Old Gods |
| 41409 | Jade Idol | 40372 | 1 | Druid | Rare | Spell | TRUE | CFM_602 | Mean Streets of Gadgetzan |
| 468 | Mark of Nature | 151 | 3 | Druid | Common | Spell | TRUE | EX1_155 | Classic |
| 41609 | Nefarian | 2261 | 9 | Neutral | Legendary | Minion | TRUE | BRM_030 | Blackrock Mountain |
| 38113 | Raven Idol | 13335 | 1 | Druid | Common | Spell | TRUE | LOE_115 | League of Explorers |
| 1161 | Starfall | 86 | 5 | Druid | Rare | Spell | TRUE | NEW1_007 | Classic |
| 38710 | Unstable Portal | 1929 | 2 | Mage | Rare | Spell | TRUE | GVG_003 | Goblins vs Gnomes |
| 38653 | Wisp | 179 | 0 | Neutral | Common | Minion | TRUE | CS2_231 | Classic |
| 137 | Wrath | 836 | 2 | Druid | Common | Spell | TRUE | EX1_154 | Classic |
We create a named list that can be used within recode():
# list values are correct ids
mislab_recode <- as.list(mislabelled$dbfId.y)
# list names are mislabelled ids
names(mislab_recode) <- mislabelled$dbfId.x
# intermediate objects no longer needed
rm(cards_used, missing_cards, cards_all_raw, mssng_cards)
The following items may be used for plotting:
# class colours
class_colors <- c(
"Druid" = "#FF7D0A",
"Hunter" = "#228B22", #"#ABD473",
"Mage" = "#40C7EB",
"Paladin" = "#F58CBA",
"Priest" = "#FFFFFF",
"Rogue" = "#FFF569",
"Shaman" = "#0070DE",
"Warlock" = "#8787ED",
"Warrior" = "#C79C6E",
"Neutral" = "#777777"
)
# rarity colours
rarity_colors <- c(
"Free" = NA,
"Common" = "#000000",
"Rare" = "#1E90FF",
"Epic" = "#9932CC",
"Legendary" = "#FFB90F"
)
# release dates of card sets
release_dates <- c(
"Launch" = "2014-03-11",
"Naxxramas" = "2014-07-22",
"GvG" = "2014-12-08",
"Blackrock" = "2015-04-02",
"TGT" = "2015-08-24",
"Explorers" = "2015-11-12",
"Old Gods" = "2016-04-26",
"Karazhan" = "2016-08-11",
"Gadgetzan" = "2016-12-01"
)
# darker plot theme for this doc
theme_doc <- theme(panel.spacing = unit(1, "points"),
panel.grid.major.y = element_blank(),
plot.title = element_text(size = 12),
axis.text.y = element_text(size = 9),
panel.background = element_rect(fill = "#C3C3C3"))
The question we are trying to answer is
Which are the most popular cards used in Ranked decks (Standard format)?
We measure card popularity by counting the number of decks that include at least one copy of that card.
To begin, we filter only for Ranked Decks in Standard format, pick out the variables of interest, join the decks to their compositions, and pivot the card columns.
ranked_deck_cards <- decks_attr %>%
filter(deck_type == "Ranked Deck",
deck_format == "S") %>%
select(deck_id, deck_class,
date, hsmonth, hsyear,
deck_format, deck_set, craft_cost) %>%
inner_join(decks_comp, by = "deck_id") %>%
gather(card_no, card_id, card_0:card_29) %>%
select(-card_no) %>% # drop the key column which is unnecessary
# select unique combinations of deck and card (ignore duplicates)
distinct(deck_id, card_id, .keep_all = TRUE)
glimpse(ranked_deck_cards)
## Observations: 3,806,924
## Variables: 9
## $ deck_id <int> 433003, 432997, 432995, 432992, 432990, 432989, 43...
## $ deck_class <fct> Warrior, Paladin, Warrior, Priest, Mage, Mage, Rog...
## $ date <date> 2016-02-19, 2016-02-19, 2016-02-19, 2016-02-19, 2...
## $ hsmonth <S3: yearmon> Feb 2016, Feb 2016, Feb 2016, Feb 2016, Fe...
## $ hsyear <fct> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 20...
## $ deck_format <fct> S, S, S, S, S, S, S, S, S, S, S, S, S, S, S, S, S,...
## $ deck_set <fct> Explorers, Explorers, Explorers, Explorers, Explor...
## $ craft_cost <int> 9840, 7700, 5740, 8780, 2800, 0, 2980, 1540, 4440,...
## $ card_id <int> 75, 476, 75, 272, 113, 77, 251, 242, 290, 180, 476...
Before any further analysis, we want to check how many cards in this dataset are mislabelled:
sum(!(ranked_deck_cards$card_id) %in% cards_simple$dbfId)
## [1] 61269
We use the list mislab_recode to relabel those cards:
ranked_deck_cards$card_id <- recode(ranked_deck_cards$card_id, !!!mislab_recode)
# check again
sum(!(ranked_deck_cards$card_id) %in% cards_simple$dbfId)
## [1] 0
To derive overall card popularity, we just count the number of times each card_id appears, then join it to the cards_simple data to retrieve the card details. After that, we can sort by number of decks:
# create set of ranked standard decks for pct calculation
rs_decks_all <- decks_attr %>%
filter(deck_type == "Ranked Deck",
deck_format == "S")
pop_cards_all <- ranked_deck_cards %>%
group_by(card_id) %>%
summarise(n_decks = n(),
pct_all = n_decks * 100 / nrow(rs_decks_all)) %>%
inner_join(cards_simple, by = c("card_id" = "dbfId"))
As the table below shows, the most popular cards are expected to be Neutral class since they can be used by all nine classes in the game, as opposed to Class-specific cards:
| Name | Class | Rarity | Card Set | No. of Decks | % of All Decks |
|---|---|---|---|---|---|
| Azure Drake | Neutral | Rare | Hall of Fame | 62327 | 32.2 |
| Sylvanas Windrunner | Neutral | Legendary | Hall of Fame | 46499 | 24.0 |
| Emperor Thaurissan | Neutral | Legendary | Blackrock Mountain | 37614 | 19.4 |
| Dr. Boom | Neutral | Legendary | Goblins vs Gnomes | 37033 | 19.1 |
| Sludge Belcher | Neutral | Rare | Curse of Naxxramas | 34718 | 17.9 |
| Defender of Argus | Neutral | Rare | Classic | 31081 | 16.1 |
| Ragnaros the Firelord | Neutral | Legendary | Hall of Fame | 29656 | 15.3 |
| Piloted Shredder | Neutral | Common | Goblins vs Gnomes | 28434 | 14.7 |
| Acolyte of Pain | Neutral | Common | Classic | 28340 | 14.6 |
| Bloodmage Thalnos | Neutral | Legendary | Classic | 28227 | 14.6 |
Which abilities/properties makes these cards popular?
As the Hall of Fame was only created in April 2017; the popularity of those three cards (Azure Drake, Sylvanas Windrunner, Ragnaros the Firelord) may have been one of the factors behind Blizzard’s decision to move them out of Standard format and into the Hall of Fame.
The plot below shows the relative popularity of all cards in the dataset:
So we can follow up by asking:
Amongst class-specific cards, which is the most popular card for each class?
By tallying the total number of decks by class, we can later evaluate for each class, the percentage of decks which use the most popular class-specific cards. After filtering out the Neutral cards, we can look for the most popular class-specific cards:
rs_decks_class <- rs_decks_all %>%
group_by(deck_class) %>%
summarise(tot_decks_class = n())
pop_cards_class <- pop_cards_all %>%
filter(!is.na(cardClass), cardClass != "Neutral") %>%
select(cardClass, name, rarity, card_set, n_decks) %>%
inner_join(rs_decks_class, by = c("cardClass" = "deck_class")) %>%
mutate(pct_class = n_decks * 100 / tot_decks_class)
The most popular class-specific cards are as follows:
| Class | Name | Rarity | Card Set | No. of Decks | Total No. of Class Decks | % of Class Decks |
|---|---|---|---|---|---|---|
| Druid | Swipe | Free | Basic | 18802 | 20163 | 93.3 |
| Hunter | Animal Companion | Free | Basic | 15772 | 18226 | 86.5 |
| Mage | Frostbolt | Free | Basic | 22374 | 23850 | 93.8 |
| Paladin | Truesilver Champion | Free | Basic | 20395 | 23742 | 85.9 |
| Priest | Shadow Word: Death | Free | Basic | 21863 | 24396 | 89.6 |
| Rogue | Eviscerate | Common | Classic | 17717 | 19058 | 93.0 |
| Shaman | Hex | Free | Basic | 16880 | 21099 | 80.0 |
| Warlock | Power Overwhelming | Common | Hall of Fame | 13400 | 21268 | 63.0 |
| Warrior | Fiery War Axe | Free | Basic | 20856 | 21776 | 95.8 |
Which abilities/properties makes the above cards popular?
Some of the other popular cards (in the following plot) also have effects like card draw or area of effect.
While Frostbolt appeared in the largest number of decks, many of the most popular cards were included in more than 90% of decks from their class. The only exception would be Power Overwhelimg (Warlock) which only appeared in 63% of Warlock decks.
As the table shows, cards from the Basic and Classic sets tend to be more popular because they do not rotate out of Standard format, and they have been around since the launch of Hearthstone.
The following plot shows the relative popularity of the top 3 cards for each class. Dashed lines indicate the total number of decks representing each class in the dataset:
## Warning: Removed 21 rows containing missing values (geom_point).
Our decks_attr data spans 3 years from March 2014 to March 2017. Recall that years are defined by release dates of card sets instead of calendar months:
To derive card popularity, we count the number of times each card_id appears in each hsyear:
rs_decks_year <- rs_decks_all %>%
group_by(hsyear) %>%
summarise(tot_decks_year = n())
pop_cards_year <- ranked_deck_cards %>%
group_by(hsyear, card_id) %>%
summarise(n_decks = n()) %>%
inner_join(cards_simple, by = c("card_id" = "dbfId"))
The following table shows the top 3 cards for each year. As expected, Neutral cards are used more often than class-specific cards. At the same time, the most popular card in 2015 and 2016 (Dr. Boom and Azure Drake) were considered even more essential than the others, considering that they were included in more than 40% of decks submitted:
| Year | Name | Rarity | Card Set | No. of Decks | Total No. of Decks | % of Decks |
|---|---|---|---|---|---|---|
| 2014 | Sludge Belcher | Rare | Curse of Naxxramas | 10490 | 35791 | 29.3 |
| 2014 | Defender of Argus | Rare | Classic | 8832 | 35791 | 24.7 |
| 2014 | Harvest Golem | Common | Classic | 8814 | 35791 | 24.6 |
| 2015 | Dr. Boom | Legendary | Goblins vs Gnomes | 31770 | 73406 | 43.3 |
| 2015 | Sludge Belcher | Rare | Curse of Naxxramas | 24227 | 73406 | 33.0 |
| 2015 | Piloted Shredder | Common | Goblins vs Gnomes | 23333 | 73406 | 31.8 |
| 2016 | Azure Drake | Rare | Hall of Fame | 34079 | 84381 | 40.4 |
| 2016 | Sylvanas Windrunner | Legendary | Hall of Fame | 21198 | 84381 | 25.1 |
| 2016 | Brann Bronzebeard | Legendary | League of Explorers | 19699 | 84381 | 23.3 |
Which abilities/properties makes the above cards popular?
On the other hand, the Ranked ladder resets to mark a new Season at the start of every calendar month. Players take this opportunity to improve upon their previous ranking, and they may try to include new cards to overcome the popular decks that they tend to encounter.
From the graph above, there is usually a spike in the number of decks submitted when a new card set is released (marked with dark lines), which gradually tapers off until the next card set is released.
A possible question to ask may be:
Are there any shifts in card popularity in the three months following the release of the Old Gods card set (2016-04-26)?
Looking at the number of decks submitted every day for the first three months following 26th April in the plot below, the number of decks submitted (and therefore the number of cards included) sharply tapers off after the first week.
pop_cards_2016_aprjul <- ranked_deck_cards %>%
filter(date >= as.Date("2016-04-26"),
date <= as.Date("2016-07-25")) %>%
group_by(hsmonth, card_id) %>%
summarise(n_decks = n()) %>%
inner_join(cards_simple, by = c("card_id" = "dbfId"))
While cards from the latest card set were popular in the first week (still in April), cards from the older card sets (such as Classic) returned to the top spots in subsequent months. However, since the data concerns decks submitted (not necessarily played in games) we cannot infer that the new cards fell in popularity, only that decks submitted later favoured including older cards over the new ones.
| Month | Name | Card Set | No. of Decks | Total No. of Decks | % of Decks |
|---|---|---|---|---|---|
| Apr 2016 | C’Thun | Whispers of the Old Gods | 2126 | 6721 | 31.6 |
| Apr 2016 | Azure Drake | Hall of Fame | 2110 | 6721 | 31.4 |
| Apr 2016 | Sylvanas Windrunner | Hall of Fame | 1984 | 6721 | 29.5 |
| May 2016 | Sylvanas Windrunner | Hall of Fame | 3742 | 10546 | 35.5 |
| May 2016 | Azure Drake | Hall of Fame | 3333 | 10546 | 31.6 |
| May 2016 | Brann Bronzebeard | League of Explorers | 2204 | 10546 | 20.9 |
| Jun 2016 | Sylvanas Windrunner | Hall of Fame | 1563 | 4776 | 32.7 |
| Jun 2016 | Azure Drake | Hall of Fame | 1526 | 4776 | 32.0 |
| Jun 2016 | Emperor Thaurissan | Blackrock Mountain | 979 | 4776 | 20.5 |
| Jul 2016 | Azure Drake | Hall of Fame | 1001 | 2858 | 35.0 |
| Jul 2016 | Sylvanas Windrunner | Hall of Fame | 953 | 2858 | 33.3 |
| Jul 2016 | Emperor Thaurissan | Blackrock Mountain | 638 | 2858 | 22.3 |
Which were the most popular cards (Neutral and Class-specific) for each card set?
The first table shows the most popular Neutral card from each card set, with many of them being Legendary:
| Card Set | Name | Class | Rarity | No. of Decks |
|---|---|---|---|---|
| Basic | Acidic Swamp Ooze | Neutral | Free | 17107 |
| Blackrock Mountain | Emperor Thaurissan | Neutral | Legendary | 37614 |
| Classic | Defender of Argus | Neutral | Rare | 31081 |
| Curse of Naxxramas | Sludge Belcher | Neutral | Rare | 34718 |
| Goblins vs Gnomes | Dr. Boom | Neutral | Legendary | 37033 |
| Hall of Fame | Azure Drake | Neutral | Rare | 62327 |
| League of Explorers | Brann Bronzebeard | Neutral | Legendary | 25885 |
| Mean Streets of Gadgetzan | Kazakus | Neutral | Legendary | 5308 |
| Mean Streets of Gadgetzan | Aya Blackpaw | Neutral | Legendary | 5308 |
| One Night in Karazhan | Barnes | Neutral | Legendary | 9228 |
| The Grand Tournament | Justicar Trueheart | Neutral | Legendary | 15013 |
| Whispers of the Old Gods | N’Zoth, the Corruptor | Neutral | Legendary | 11764 |
The following plot shows the relative popularity of Neutral cards in each card set. Notably, the Basic, Blackrock Mountain and Karazhan sets have an overwhelming favourite (with a large difference over the 2nd most popular card):
## Warning: Removed 4 rows containing missing values (geom_point).
The next table shows the most popular Class-specific cards from each card set.
| Card Set | Name | Class | Type | Rarity | No. of Decks |
|---|---|---|---|---|---|
| Basic | Frostbolt | Mage | Spell | Free | 22374 |
| Blackrock Mountain | Imp Gang Boss | Warlock | Minion | Common | 11162 |
| Classic | Eviscerate | Rogue | Spell | Common | 17717 |
| Curse of Naxxramas | Death’s Bite | Warrior | Weapon | Common | 9012 |
| Goblins vs Gnomes | Shielded Minibot | Paladin | Minion | Common | 9620 |
| Hall of Fame | Power Overwhelming | Warlock | Spell | Common | 13400 |
| League of Explorers | Entomb | Priest | Spell | Common | 10527 |
| Mean Streets of Gadgetzan | Dragonfire Potion | Priest | Spell | Epic | 3410 |
| One Night in Karazhan | Maelstrom Portal | Shaman | Spell | Rare | 5505 |
| The Grand Tournament | Totem Golem | Shaman | Minion | Common | 10933 |
| Whispers of the Old Gods | Thing from Below | Shaman | Minion | Rare | 7361 |
The following plot shows the relative popularity of Class cards in each card set. Besides the Hall of Fame (which usually consists of Classic cards), the Naxxramas set has an overwhelming favourite in Death’s Bite (Warrior), which was highly valued both for its strong damage and Deathrattle effect.
## Warning: Removed 4 rows containing missing values (geom_point).
So far, we have looked at cards that users tend to include in decks in Standard format for Ranked play, which is also used for official Hearthstone tournaments - making these popular cards highly visible to a wide audience. We have also looked at card popularity when broken down by various categories, such as class, time period and card set.
Are there any limitations to the data that may have affected our analysis?
The major limitation of this data is that it only looks at decks submitted to a third-party website, which brings up the following issues:
rating attribute may reflect how strong other players consider a deck, it is also biased towards the popularity of the user as well as the date of submission:
How can we expand on this analysis?